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Abstract 

A framework for adaptive and non-adaptive statistical compressive sensing is developed, where a 
statistical model replaces the standard sparsity model of classical compressive sensing. We propose within 
this framework optimal task-specific sensing protocols specifically and jointly designed for classification 
and reconstruction. A two-step adaptive sensing paradigm is developed, where online sensing is applied to 
detect the signal class in the first step, followed by a reconstruction step adapted to the detected class and 
the observed samples. The approach is based on information theory, here tailored for Gaussian mixture 
models (GMMs), where an information-theoretic objective relationship between the sensed signals and a 
representation of the specific task of interest is maximized. Experimental results using synthetic signals, 
Landsat satellite attributes, and natural images of different sizes and with different noise levels show 
the improvements achieved using the proposed framework when compared to more standard sensing 
protocols. The underlying formulation can be applied beyond GMMs, at the price of higher mathematical 
and computational complexity. 
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I. Introduction 

Compressive sensing (CS) theory states that signals x € R N that have a sparse or compressible 
representation on a dictionary D can be sensed using far less linear measurements, y = <l?x € IR M , M <C 
N, than those required by the Shannon-Nyquist theorem, with minimum loss of information [1]; it is 
assumed x = Dee, where cx is sparse or nearly sparse. In addition to the sparsity of the signals, CS 
requires the sensing matrix * be as incoherent (uncorrected) as possible with the dictionary [1]. A 
key property in CS is the Restrictive Isometry Property (RIP) that enforces incoherence and ensures 
robustness of the reconstruction [1], [2]. In particular, its has been shown that random sensing matrices 
satisfy the RIP with overwhelming probability [1], [3]-[5]. For specific types of signals (represented by 
their associated, often learned, dictionary), deterministic sensing matrices can be designed [6], [7], or 
better yet learned, leading to significant improvements over random sensing matrices [2], [8]— [1 1] . 

Off-the-shelf dictionaries are not flexible enough to represent the large variability found in natural 
signals [12]. Learned overcomplete dictionaries can capture better this variability, but the optimization 
over general unstructured overcomplete dictionaries is often expensive and unstable due to the fact that 
the search space increases combinatorially with the number of atoms (columns) in the dictionary [13], 
[14]. Structured overcomplete (learned) dictionaries have been proposed to reduce the size of the search 
space, improving the sparse representation of complex signals [13]— [16]. 

Reconstruction of CS signals is usually performed via nonlinear optimization strategies such as reg- 
ularized orthogonal matching pursuit (OMP) [17] and i\ convex optimization [1], [3]-[5], [18]. A 
piecewise linear inversion model (PLM) was recently introduced [13], [19] (see also [20]), based on 
the maximum a posteriori expectation-maximization method (MAP-EM) for signals following a learned 
statistical Gaussian Mixture Model (GMM), which is a case of structured sparsity. The PLM has been 
shown to be effective and computationally efficient to reconstruct signals degraded by noise, blurring, 
sub-sampling, or any other linear filters such as CS random matrices. Theoretical analysis [19], [20] 
(which mostly considers random sensing matrices) indicates numerous advantages of such a statistical 
model compared to standard deterministic sparsity models. 

The original CS framework advocates the use of non-adaptive linear measurements. Adaptive CS [21]- 
[24] has been recently introduced, where each new measurement uses the information obtained from the 
previous measurements, focusing on subspaces that are more likely to contain true signal components [22], 
[23]. A generalization of the adaptive CS theory is the adaptive task specific imaging (ATSI) framework, 
[25], where the task can be reconstruction [26], [27], classification [28], [29], or target detection [25]. 
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ATSI adaptively selects the new measurements that maximize the mutual information between the CS 
signals and a representation of the specific task of interest (labels for instance in classification), thereby 
connecting information theory with compressed sensing. 

In [11], we extended [9] and proposed a non-adaptive (batch) sensing matrix for statistical CS (SCS) 
of Gaussian mixture models (GMMs), shown to outperform random sensing matrices. However, the non- 
adaptive sensing matrix proposed in [11] did not exploit the structure of the dictionary, nor the model 
employed. In this work, we first propose a new optimal sensing matrix specifically designed for SCS of 
GMMs. This sensing matrix can be used for classical non-adaptive CS, or as explained in Section IV, 
in the first step of a novel two-step adaptive statistical CS (ASCS) here introduced and described next. 

Inspired in part by ATSI [25], [28], we propose an adaptive two-step SCS framework, tailored to the 
learned GMM. In the first step, the task is classification/detection, where we can either use non-adaptive 
measurements (computationally cheaper, see Sections III and IV) or we can online add adaptive measure- 
ments (computationally more expensive, see Section IV) that maximize an information-theoretic objective 
function between the CS signals and the classes (Gaussians), while considering the measurements made 
so far (the measurements can be optimized in batch mode or one at a time). Once the Gaussian model has 
been estimated in the first step, the measurements on the second step are chosen (in one offline optimally 
computed block) either from a non-adaptive block optimal for the proper Gaussian model, or from an 
adaptive block that maximizes the mutual information between the CS signals and the original signal 
(for reconstruction purposes), taking into account both the previous measurements and the detected class. 
Hence, several degrees of adaptivity are possible (Section IV), offering a great deal of flexibility. The 
computational complexity of the different options is presented as well. As we will later see, this two-step 
adaptive CS paradigm improves reconstruction accuracy, compared to using non-adaptive measurements. 
In addition, we use sequential hypothesis testing [28], [30], [31] to automatically determine when to 
stop acquiring new measurements for classification purposes (first step), and automatically switch to the 
second step, where the focus is on reconstruction, or we stop the acquisition all together if the detected 
class is not of interest. 

Related works, such as ATSI [25], [28], and more recently [27], [29], consider a single adaptive step. 
In particular [25], [28] use parametric models, where the variables of interest are considered as fixed 
unknown parameters, and the only source of variability is the additive random noise. Here, the signals are 
realizations of multidimensional Gaussian random variables, within the GMM. In [27], the authors present 
batch and adaptive CS matrices, based on information theory and its relationship with the minimum mean 
squared error (MMSE) [32], [33]. For the specific case of GMMs, additive white Gaussian noise, and 
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a signal with a known Gaussian distribution, the optimal sensing matrix can be constructed by taking 
the first M eigenvectors of the corresponding co variance matrix [27], [34]. This result is included here 
(see Section III), as a non-adaptive second step, where we already have an estimate of the identity of the 
Gaussian (we do not know the identity of the Gaussian in the first step). In [27], [29], the authors use 
general (Gaussian or not) signal models to obtain general sensing matrices, although most of the proposed 
solutions require expensive Monte Carlo simulations in order to compute expectations. Alternatively, the 
authors also propose to use a different measure of information, the Renyi entropy [35], elegantly leading 
to a closed-form solution for the gradient of the Renyi entropy and the GMM. Here, we exploit a different 
well-known measure of information (^-measure, see SectionlV), and provide also a closed-form solution 
for its gradient in the first step (class detection) and a non-iterative closed-form solution for the second 
step (reconstruction). The main differences of this work with the seminal works [25], [27]-[29] are the 
general two-step framework, the specific SCS models employed, and the mathematical simplicity of the 
derived equations resulting from the proposed measures. 

In summary, the main contribution of this work is to provide a variety of statistical CS options ranging 
from a simple non-adaptive block SCS framework for GMMs, to several CS configurations offering 
different degrees of adaptivity within the proposed two-step (model detection and then reconstruction) 
SCS framework. Most of these configurations can be done either offline or online. Despite the fact that 
we use here a GMM, the proposed approach can be extended to non-Gaussian models at the cost of 
higher mathematical and computational complexity. 

Section II briefly reviews the statistical compressive sensing (SCS) of GMMs framework and the 
associated PLM reconstruction. Section III introduces the optimal non-adaptive statistical CS approach 
for GMMs. Section IV presents the several adaptive options within the two-step adaptive statistical CS 
paradigm, focusing on the most adaptive cases. Section V presents the highlights of the experiments 
with synthetic and real signals (patches from natural images and Landsat satellite attributes), where the 
different degrees of adaptivity are compared. Conclusions of this work are presented on Section VI, and 
the Appendix contains all the mathematical derivations that were not included in the text for clarity of 
the exposition. Numerous additional experimental results are included in the supplementary material. 
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II. Statistical Compressive Sensing of Gaussian Mixture Models 

Let us assume that there exist G Gaussian distributions such that the signals of interest x G M. N can 

be modeled as a mixture of Gaussians 

G 

p(x) = ^p(5)AA(x|/x 9 ,£ 9 ), 

9=1 

Af(x\H g , S fl ) = ^ (27r ) n|s j exp ( - ^(x - M ff ) T S 9 x (x - /x,)) , g € {1, . . . , G}, (1) 

where p(g) is the probability that x is a realization of the g-th Gaussian distribution, and /j, g € WL N , S 9 
correspond respectively to the mean and NxN co variance matrix of the 51-th Gaussian distribution. Even 
more, let us assume that a given x is associated with exactly one (one-block sparsity) of the G mixture 
components, and mixture component g is selected with probability p(g). While it is straightforward to 
extend this to a mixture of two or more Gaussian distributions (beyond one-block sparsity), as indicated 
in [13], [19], increasing the complexity of the model does not necessarily improve the reconstruction of 
the signals. 

The corresponding structured overcomplete dictionary for this model is given by [13] 



DiVxGAf = 



Vi...V G 



, S 9 = V 9 A,Vj, g G {!,..., G}, (2) 



based on the PCAs of the covariance matrices In this dictionary, x is represented as, 

T 



x = Da 



Vi...V G 



T ...aJ...O T 



VgtXg, (3) 



where ||a||o = ||<*g||o = L < N <C GN, and L corresponds to the largest eigenvalues of Hence, 
the signal representation can be very sparse in this overcomplete structured dictionary, where typically 

G» 1. 

Now, let y = 3>x+T7 be the CS signal, where &MxN, M <C N is a given (fixed for now) sensing matrix, 
and rj is additive zero-mean white Gaussian noise, rj ~ M(0,a 2 I). Following an adequate initialization 
of the PCA basis [13] for the signals at hand, the signal can be reconstructed from its projections 
using an iterative maximum a posteriori based Expectation-Maximization (MAP-EM) algorithm, which 
simultaneously learns the GMM. In the E-step, a MAP estimate of the signal and the Gaussian (model) 
is obtained, and in the M-step, the parameters of the Gaussian are updated. The first part of the E-step 
corresponds to the MAP estimate of the signal, considering the Gaussian model x ~ A/"(0, S s ) [13], 

& g = argmin^ j||y - ®V g a g \\l + c^ajA" 1 ^}. (4) 
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Notice that we assumed here zero mean signals, which can be achieved simply by subtracting the Gaussian 
mean. Equation (4) can be efficiently solved in closed form using the Wiener filter W s , for each Gaussian 
g€{l,...,G} [13], 

& g = W g y, W g = S 9 Vj$ T (*V 9 S 9 Vj* T + a 2 I M y\ (5) 

Computing a g for each g G {1, . . . ,G}, the best Gaussian (g) (model selection) can be found at this 
step and from there an estimate of the signal x = V g a g , using (5). 

In the M-step, the Gaussian parameters are updated (see [13] for a discussion on optimality of this), 

Vg = T^J2 Xi > S 9 = ^^( x *-/ 1 <y)( x i-A t <y) T ' ( 6 ) 

where S g is the set of indices corresponding to the signals modeled best by the Gaussian g. Once we 
have updated the Gaussian parameters, the PCA decomposition of the covariance matrices (2) updates 
the dictionary. Usually, two iterations of the MAP-EM algorithm are enough for a given fixed 

III. Non-Adaptive (Batch) Statistical Compressive Sensing 

A deterministic non-adaptive sensing matrix <I> for SCS of GMMs can be used for both batch processing 
and adaptive SCS (ASCS). Indeed, a deterministic non-adaptive sensing matrix could be employed in 
the first step of the two-step adaptive SCS (Section IV), replacing the standard non-adaptive randomly 
constituted sensing matrix. As indicated in [11], a deterministic non-adaptive sensing matrix, optimal 
for unstructured dictionaries [9], already improves reconstruction with respect to a randomly constituted 
sensing matrix in SCS. However, using the optimal non-adaptive sensing matrix for structured dictionaries, 
proposed in [10], does not achieve better results in this case [11]. The reason is that these sensing 
matrices do not exploit the known dictionary structure (Equation (2)), nor the GMM employed. The 
sensing approach introduced next does exploit these important properties. 

We first encourage the RIP property by requiring that the columns of the equivalent dictionary $D 
be as orthogonal as possible [9], [10], 

* = argmin # {||D T * T *D-I GiV |||}. (7) 

As shown in [10], 

||D T $ T $D - 1 G n\?f = ||*DD T * T - I M f F + GN-M. (8) 
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Since in CS, GN — M > 0, the minimum can be achieved by minimizing ||3>DD T <1> T — ImIIf- ^ or tne 
case of our SCS (Equation (2)), 



|$DD T * T -I„ 112 



L M\\F 



G 2 



= \\G*& T -I M \\ 2 F . (9) 

F 



*(E V « V ?')* T - 1 ^ 

9=1 

Hence, in order to encourage an RIP-type property, our SCS model only requires that the rows of the 
CS matrix be orthogonal, i.e., = ^Ijv/- Since, ^ is just a normalization constant, from now on 

and without any loss of generality, we will refer to this type of RIP condition as = Im- 

Since there are infinitely many matrices of size MxN with orthogonal rows, we are free to impose 
additional constrains to the CS matrices. It is well-known [13], [19] that for a Gaussian signal, x ~ 
J\f{Hgi if we choose <& = [v* . . . ] T , where v* , i G {1, . . . , L}, are the first L eigenvectors of the 
covariance matrix, then the linear estimate 

L L 

x = * T (*x) = £<x, vj)vj = £ A lV ^ (10) 

i=l i=l 

minimizes the mean square reconstruction error (MSE, we will later deal with other tasks as well), 

L N 
M^=||x-x||i = ||x-J](x,v^||2= (ID 

i=l i=L+l 

where A« correspond to the i-th eigenvalue of the covariance matrix of the g-th Gaussian. Hence, for a 
given Gaussian g, we have 

*V 9 = [I M O m ,n-m}- (12) 

Given that in non-adaptive CS (or in the first step of the two-step adaptive CS), we do not know a 

priori the corresponding Gaussian model of a given signal, we could try to satisfy (12) on average as 

G „ 

* = argmin$j|*^p(flr)V 9 - [I m Om,jv-m]|| }, s.t. = Im, (13) 

3=1 

where we have also imposed the "RIP" (orthogonality) condition (9). Another possibility could be to 
sense for the average Gaussian, this possibility is addressed in Section IV-A. 

Let us define $ = [Im Om,at-m]B, where B is an orthonormal basis in R N , and E = Yl g =iP{9V^g 
is the expected Gaussian basis. Notice that since B is orthonormal, the rows of $ are orthonormal too, 
satisfying the condition (9). Equation (13) then simplifies to 

B = argming ||[Im O m ,jv-m]BE - [I m MiA t_ m ] J, si.BB T = Ijv 

= argming|||BE-lAr||^| s.t. BB T = Ijy. (14) 
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The solution to (14) corresponds to a particular case of the generalized orthogonal Procrustes problem 
[36], which in our case reduces to B = WU T (see derivation steps in Appendix A), where E = UAW T 
is the singular value decomposition of E. Hence, the solution to (13) is given by 

* = [Im M ,iV-M]WU T , (15) 

i.e., the first M rows of WU T (see comments on computational complexity in the supplementary 
material). We will denote the sensing matrix given in (15) as RIP-average of basis (RIP-AB). 

To conclude, in this section, we proposed a new non-adaptive batch CS matrix for GMMs that can 
also be used in the first step of the proposed two-step adaptive framework, as explained next. 

IV. Adaptive Task Driven Statistical Compressive Sensing 

As indicated in the introduction, the proposed two-step adaptive SCS algorithm uses K < M <C N 
measurements in the first step to identify the best Gaussian model for a given signal x, and in the second 
step, it adds M — K measurements using an adaptive or non-adaptive sensing matrix for that detected 
Gaussian. Several configurations of this two-step adaptive SCS are possible, with different degrees of 
adaptivity, as indicated in Table I. The corresponding computational complexity (see derivation details in 
the supplementary material) is also shown in this table, where S is the number of signals, k the number 
of MAP-EM iterations (Section II), and \ the number of steepest ascent iterations (Section IV-A). 

In the first step, we can use K non-adaptive measurements, which can be random or the non-adaptive 
RIP-AB sensing matrix derived in Section III. Given that in the first step we are interested in feature 
selection for classification purposes (detecting the Gaussian), we could also use non-adaptive information 
discriminant analysis (IDA) [37] (see Section IV-A). Optionally, in the first step, K adaptive measurements 
can be made, based on our extension of IDA, called here adaptive information discriminant analysis 
(AIDA) (Section IV-A). The number of adaptive measurements in the first step (K) can be automatically 
determined using sequential hypothesis testing (SHT), as decribed in Section IV-C. In the second step, the 
first M — K eigenvalues of the corresponding covariance matrix (the Gaussian model has been identified 
in the first step) can be used, which as indicated before (Sections I and III) are optimal (in the MSE 
sense) for that Gaussian. However, this optimal sensing matrix disregards all previous K measurements 
made in the first step. Hence, we also provide here an optimal adaptive sensing matrix for the second step 
(reconstruction) that maximizes the mutual information (MI) between the CS signals and the (unknown) 
original signals, given that we now know the Gaussian model (estimated in the first step) and also the 
previous measurements (see also [27]). 
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TABLE I 

Two-steps Adaptive Statistical CS configurations. 



Step 1. Classification/Detection (unknown Gaussian) 


Step 2. Reconstruction (known Gaussian) 


Computational Complexity 

(see details in the supplementary material) 


Sensing 


Adaptivity 


Sensing 


Adaptivity 


Random 


No 


Optimal (MSE), non-adaptive (Section III) 


No 


OinSGMN 2 ) 


RIP-AB (Section III) 


No 


Optimal (MSE), non-adaptive (Section III) 


No 


0(kG(N 3 + SAIN)) 


IDA (Section IV-A) 


No 


Optimal (MSE), non-adaptive (Section III) 


No 


0(k( x M 3 + GSMN)) 


IDA (Section IV-A) 


No 


Optimal (MI), adaptive (Section IV-B) 


Yes 


0(k( x M 3 + GSMN)) 


AIDA-SHT (Sections IV-A, IV-C) 


Yes 


Optimal (MI), adaptive (Section IV-B) 


Yes 


0(kS( X M 4 + GMN 2 )) 



Each of the possibilities in Table I, with the exception of AIDA-SHT in the first step and the adaptive 
optimal (in the MI sense) sensing matrix in the second step, have already been defined (IDA is a particular 
case of AIDA). Hence, it remains only to introduce these new fully adaptive sensing protocols, which is 
the subject of the next two sections. 

Note that there are other configurations possible in this two-step framework that are not in Table I. 
For instance, using the optimal (MI) adaptive sensing matrix in the second step, when in the first step 
non-adaptive random or RIP-AB measurements were made, or using AIDA with a pre-defined number 
of steps (without SHT). However, we limit ourselves here to the indicated configurations, since these are 
the most representative, and from these configurations one can infer the behavior of others. This great 
deal of flexibility of the proposed two-step adaptive SCS framework makes it very attractive, providing 
different options to choose from depending on the application and available computational resources. The 
most expensive computational costs incurred by these configurations (indicated in Table I) can be done 
offline, before the actual sensing takes place, as detailed in the complexity analysis in the supplementary 
material. 

A. Step 1: Classification/Detection 

We extend the information discriminant analysis (IDA) framework [37], that performs non-adaptive 
linear measurements aimed at extracting the best features for classification, to an adaptive IDA (AIDA). 
Let y(fe_i) = [yi • • • yjfc-i], be the previous (known) k — 1 measurements of size b > 1 each, and 
Yk the (unknown) next k-th measurements of size b. Also, let y^-i) = *(fe-i)X, where 3>(fc-i) = 
[&i . . . $^_ 1 ] T is the (k — l)bxN (known) sensing matrix used so far, and the bxN (unknown) 
sensing matrix for the next b measurements, i.e., y& = ^^x. Note that consistent with this notation, 
y( fc ) = $a)X, which corresponds to all measurements made so far plus the new ones. 

We want to find that maximizes the mutual information (MI) between the new measurements and 
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the unknown Gaussian class g G {0, . . . , G}, given all previous measurements, I(yk',g\y(k-i))> 

* fc = aigmax^ fc /(y fc ;fli|y( fc _ 1 )), s.t. & k ®k = h, (16) 

where we have also imposed orthonormality following the RIP condition (see Section III), also constrain- 
ing the amount of energy required by the sensor [27], [28], [37]. By definition of the conditional mutual 
information [35], 

j, | \ f, P(yfc,5|y(fc-i)) 1 

J(y fc ; g |y |t - 1 ,) = g y ,. a . y „-,{iog ;)(yt|y(i _ i)M!7|y(t _ i)) }, <n) 

where represents expectation. From now on, and for simplicity, we omit the dependent variables 
in the expectation. 

Using Baye's rule and logarithm properties, (17) can be expanded to (see details in Appendix B), 



I(yk;g\y(k-i)) = H(p(y {k) )) - H(p(y (k) \g)) - H(p(y {k _ 1) )) - H(p(y (k _ 1) \g)) 



(18) 



where H(.) is the differential entropy. Since the entropies within brackets do not depend on $> k , we 
consider them constants and denoted by C(y(j._i)). Given that y^ = <&(fc)X is a linear transformation 
of a Gaussian (g € {1, ... , G}) random vector x, the class conditional probability is given by [38] 

where E yw \ g = & {k) Z g &^ + CT 2j and = $ {k)lXg . The entropy of p{y {k) \g) (see (18)) has a 

known closed-form given by [35], [39] 

lr ° i 

H(p(y (k) \g)) = -[fc6(l + log(27r)) + ^p( 5 )log|E yw | fl |j, (20) 

9=1 

where as before p(g) is the probability that x ~ N{n g , 
On the other hand, the entropy of p(yi k \), 

H{p{y( k ))) = - j p(y(k))log p(y(*i))rfy(fc), (21) 

in (18) has no closed-form. Rather than numerically computing (21), we use here the approximation in 
[37], [39], and define 

P(y( fc )) = */p= — f xp ( ~ 2 (y « " VyJ^ylMv ~ *W)' ^ 



9=1 9=1 
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whose entropy has closed-form and provides an upper bound to the entropy of p(y^)) [39], i.e., 

H(p(y (k) )) < - J p(y (k) ) log p(y (k) )dy {k) = X - [kb(l + log(27r)) + log |S Y(fe) |] . (23) 

Since in our statistical CS model (Section II), we made At y | g = 0, hence, Ay (fe) = an(1 

G 

S y(fc) = * (fc) S*f fc) +a 2 I, E = $>(</)E fl . (24) 

9=1 

Replacing (20) and (23) in (18), we obtain 
1 G 

i(yk;g\y(k-i)) < 2 ( lo s l^y (fc) I - X^( 5 log l s y w i9l) +c(y(*i-i)) = M(yfc;s|y(k-i))> ( 25 ) 

9=1 

where as in IDA [37], we have defined the right term as an //-measure, providing a more mathematically 
tractable problem and also a general measure of class discrimination that extends beyond Gaussian 
distributions. Hence, instead of maximizing the MI, which has no closed form, we propose to maximize 
the //-measure, given by 

= argmax 4(t /i(y fc ; 5 r|y (fe _ 1) ), s.t. = I 6 . (26) 

As shown in [37], // is a class-separability measure that is optimal in the Bayes sense, when the 
noise is uncorrelated to the classes and the class differences are all in the signal subspace. After some 
mathematical derivations (see details in Appendix C), the adaptive //-measure (26) can be rewritten as 

1 G 

Kyk;g\y(k-i)) = -(iog|* fe P^| -^p((/)iog|* fc P fl *^|) +D(y {k _ 1) ), (27) 

9=1 

P9 = S 9 - E^f^E- 1 ^^*^!^ + a 2 I b , 
P = E - ESf^S" 1 ^ + a 2 I b , 

where D(y^ k _^) accounts for all the terms involving previous measurements which do not depend on 
<&fc. In (27), the adaptive //-measure depends not only on the (unknown) new block sensing & k matrix, 1 
but also on the sensing matrix we have so far &( k -u. The solution to (26) can be obtained via steepest 
ascent, with gradient given by (see useful vector derivatives in [37], [40]) 

g^^-i)) = ($fc p # T r i #fc p _ j2p( 9 )^kP 9 ^ k r^kP 9 . (28) 

9=1 

The condition & k & k = lb in (26) can be imposed at the end of each steepest ascent iteration as in [37]. 

'We consider the general case of block size b > 1. If b = 1, then we have adaptation for each new measurement. 
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Of course, for k = 1, we do not have previous measurements, hence, we can use the non-adaptive 
IDA, which we repeat here for completeness of the presentation [37], 

1 G 
KYk-,9) = g ( log | - log |*iE p *f |) • (29) 

9=1 

The maximum ^-measure, conditioned by 3>i<&f = If,, is obtained by steepest ascent with gradient [37] 

= (MS^M - X>k)(* 1 £ fl *f)- 1 * 1 E 1 ,. (30) 

1 9=1 

It is worth nothing here the similarity between the IDA equations (29) and (30), and AIDA equations 
(27) and (28). 2 Indeed, the role that £ and ~E g play on IDA corresponds to P and P g , respectively, on 
AIDA. Note also that IDA uses the average (expected) Gaussian (S,p = 0). 

After k adaptive measurements, y™, the identification of the Gaussian (classification) can be performed 
using any local classifier. In particular, we can use the MAP criteria, 

g = argmin^y^S-^.y^ + log|£ yw | d |}, (31) 

for classification purposes (as used in Section V-C), or the MAP criteria indicated in Section II (equations 
(4) and (5)), that is based on x, which is better for reconstruction purposes (used in sections V-A, V-B). 



B. Step 2: Reconstruction 

Having identified the Gaussian model for the signal in the first step, we focus now on finding adaptively 
the best reconstruction sensing matrix for this Gaussian (as mentioned before, this step can be skipped 
if the detected Gaussian/model is not of interest for reconstruction). One possibility is to use the optimal 
(in the MSE sense) sensing matrix for this Gaussian, as indicated in Section III. However, this approach 
disregards all the previous information (K measurements), and hence, it is in general suboptimal. 

At this step, we want to maximize the MI between the signal x and the new measurements y^ of size 
b (see also [27], where MI for kernel design is introduced and elegantly analyzed), having into account 
previous measurements y(fc_i) (including those from the first step and any other previous measurements 
at this second step), and also the now detected Gaussian, 7 G {1 . . . G}, i.e., we want to find 

* fe = argmax| >fc /(y fe ;x|y (fe _ 1) ,7), s.t. & k &l = I b . (32) 
2 The IDA equations presented here for k = 1 are indeed the general IDA equations, since the block size b can be of any size. 



January 27, 2012 



DRAFT 



IEEE TRANSACTIONS ON SIGNAL PROCESSING 



13 



By the definition of conditional mutual information [35], 

where the knowledge of the Gaussian identity has been used to specify the signal x 7 ~ A/"(a* 7 , S 7 ), 
previous measurements y7 fe _ 1 ) = <&(fc_i)X 7 , and new (unknown) measurements yj. = 3?jfcX 7 . 
As detailed in Appendix D, (32) has a closed-form solution, given by 

* fc = [ui ... u.w k 1 '. U = [ui ... ujv], (34) 
P 7 = UAU T , P 7 = £ 7 - E 7 *f fc _ 1) E-J_ i) * (fc _ 1) E 7 + a 2 I b . 

Hence, the optimal (in the MI sense) CS matrix in the second step is given by the transposed first M — K 
eigenvectors of the matrix P 7 , which is the same as in (27), but now for a known Gaussian 7. Notice 
that if there are no previous measurements (non-adaptive CS), P 7 = E 7 , and the optimal (in the MI 
sense) CS matrix for a known Gaussian corresponds to the transposed first M — K eigenvectors of the 
coresponding covariance matrix, which is exactly the optimal non-adaptive sensing matrix (in the MSE 
sense) given in Section EH. 

C. Sequential Hypothesis Testing 

We defined AIDA in Section IV-A for the first step, where the task is classification. However, the 
number of measurements required in this step, K, was assumed to be a known parameter. We use here 
the sequential hypothesis testing (SHT) framework in [28] to automatically determine the number of 
measurements K in the first step. SHT has been shown [30], [31] to lead to fewer measurements on 
average, compared to hypothesis testing approaches that use a fixed number of measurements. 

The main idea is to use SHT to obtain a minimum possible number of measurements K in the 
first (classification) step, leaving as many samples (M — K) as possible for the second (reconstruction) 
step, where an optimal sensing can be performed given the detected Gaussian and all the previous 
measurements. It is clear, however, that an incorrect detection in the first step would lead to sub-optimal 
measurements in the second step. Hence, we need a way to decide when to stop, based on a given 
probability of classification error P e , 

G 

P e = J2p(l, ... ,g - 1, 5 + 1, ... , G)p(g), (35) 
9=1 
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Algorithm 1 Sequential Hypothesis Testing (SHT) 

Require: P e , G, M, b, p(g = l),..., p(g = G), Si, . . . , S G 
k = 1 

rj = 1 p P,i { Stopping threshold } 

# x = IDA(b, p(g = 1), . . . , p(g = G), Si, . . . , S G { Initialization } 
while kb < M do {OPTIMIZATION} 
k = k + 1 

Initialize <& fc (usually with a random sensing matrix) 

while /u(yfc; <?|yYfc_i)) increases (Equation (27)) do {Steepest Ascend} 

# fe <- # fc + «^%^ { Using Equation (28) } 
end while 

*(*.) «- [*(fc_i) *fe ] T . Yfc «- *fc x {Updates CS matrix} 
for g = 1 — > G do {Bayesian Update of Priors} 

n ( n \ , p(yk\a)p(s) _ p(yfc|g)p(g) 
pyy > p(yfc) E I c i 1 p(y)=l9=i)p(9=i) 
end for 

for i, j = 1 — > G do {Likelihood Ratios} 

nf = iP(y(ol9=0 p( g =i) 
•'^ nf =1 p(y (i )l9=j) p(s=J) 
end for 

if V^j L. 

7 i 

return 7, # fe 
end if 
end while 



where p(l, . . . , g — l,g + l,...,G) represents the misclassification error probability, when g is the true 
Gaussian. This is precisely what SHT provides, based on Bayesian updates of the prior class probabilities 
and maximum likelihood. Algorithm 1 describes in detail the SHT method [28], [41]. 

We have provided an algorithm, based on hypothesis testing, to automatically determine the number of 
measurements required in the first step of the two-step framework, for a given probability of classification 
error. This in conjunction with AIDA, provides the AIDA-SHT CS protocol cited in Table I. 

V. Experimental Results 

A. Non-Adaptive Statistical Compressive Sensing 

We compare the reconstruction performance of the proposed non-adaptive sensing RIP-AB (Section 
III), with random sensing, the optimal non-adaptive sensing for unstructured dictionaries [9], [11], 
and the optimal sensing for structured dictionaries [10]. We use all 8x8 overlapping patches from 20 
natural images taken from the Berkeley segmentation dataset [42] in order to (offline) adapt the learned 
dictionaries (GMM) and sensing matrices (which depend on the PCA basis of the GMM, except for 
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random sensing of course) to each image. The (online) evaluation of the learned dictionaries and non- 
random sensing matrices was done using non-overlapping 8x8 patches. Only n = 11 iterations of the 
MAP-EM statistical CS were needed to learn the dictionaries and non-random sensing matrices. 3 

TABLE II 

Mean Image Reconstruction PSNR (dbs), non-overlapping patches. 



Compression Ratio 


Sensing Matrix 


Random 


Optimal Unstructured 


Optimal Structured 


RIP-AB GMM 


8.0 


28.04 


28.74 


28.93 


29.30 


5.3 


29.86 


30.42 


30.59 


30.98 


3.2 


32.90 


33.47 


33.53 


34.0 



Table II shows the mean peak signal to noise ratio (PSNR) of the reconstructed images (from the recon- 

/ 1 2 \ 

structed non-overlapping patches), for three levels of signal compression, where PSNR = 10 log 10 ( jf^J , 
Imax corresponding to the maximum image intensity. The RIP-AB sensing strategy achieves the best 
reconstruction PSNRs, as expected. Using a paired t-test, we found that the differences are statistically 
significant, with a p-value less than 0.001. Figure 1 shows the original and reconstructed 8x8 non- 
overlapping patches for a selected image (results with other selected images are also shown in the 
supplementary material, figures S1-S2). The quantitative improvements are visually observed as well. 

B. Adaptive Statistical Compressive Sensing 

We now compare the classification and reconstruction performance of the proposed two-step adaptive 
statistical CS framework (Section IV), with several configurations as indicated in Table I. For this purpose, 
we generate two-class synthetic Gaussian signals of dimensions 36, 64, and 100, where the Bhattacharyya 
distance (BD) between them is varied as well, see Appendix E for details. 

In addition, we also use 6x6, 8x8, and 10x10 non-overlapping patches extracted from 50 natural 
images from the Berkeley segmentation dataset [42]. Three levels of noise were also considered: no 
added noise, noise level of 40 dbs, and noise level of 30 dbs, for both the synthetic and natural image 
signals. There are 19 learned classes in this case, used to efficiently model patches of natural images 
(see [13] for details). 

3 We require more MAP-EM iterations than those indicated on Section II, since now, the sensing matrix <& and the dictionary 
(GMM) are both being simultaneously adapted. 
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The dictionary is here not adapted to each image, in order to provide the same GMM to all the 
considered two-step configurations (Table I), so that the differences between configurations are due only 
to the sensing itself and not due to the adapted dictionaries. 

1) Synthetic Signals: Figure 2 shows the classification error at Step 1 and the corresponding MSE 
at Step 2, for the indicated BDs and three levels of noise. As expected, the best classification results 
are obtained for IDA and AIDA-SHT, being AIDA-SHT the best, stopping automatically between two 
and three samples, on average. The worst classification results were obtained for the RIP-AB non- 
adaptive sensing matrix and random sensing. It can be noticed that despite the relatively bad classification 
performance of RIP-AB, the reconstruction is equal or better than random and it improves as K — > M, 
as expected, since RIP-AB is better than random for non-adaptive single-step batch CS (sections III and 
V-A), i.e., when K = M. RIP-AB does better than random on the final reconstruction, using K < M 
measurements, because the BDs between the two Gaussians is relatively small, hence, selecting the wrong 
Gaussian will not be that harmful for reconstruction during the second step. The lowest reconstruction 
errors are achieved using AIDA-SHT in the first step and the optimal (in the MI sense) adaptive sensing in 
the second step. However, as the noise increases, the reconstruction performance of AIDA-SHT degrades 
with respect to IDA. The reason for this is that IDA uses only the (clean) information provided by the 
covariance matrices, while SHT tests hypotheses based on the noisy CS signals. Nevertheless, AIDA-SHT 
stops automatically, without knowing a priori the number of steps K required in the first step, while IDA 
(or AIDA alone) requires a pre-defined number of steps K. 

It can also be observed in Figure 2 that using IDA in the first step and the non- adaptive optimal for the 
selected Gaussian in the second step is worse than using random in the first step and the non-adaptive 
optimal for the selected Gaussian in the second, despite the fact that IDA has a very good classification 
performance in the first step. The reason for this is that IDA uses information from the average covariance 
matrix, which is likely to lead to a sensing matrix in the first step that is correlated with the sensing matrix 
in the second, especially when the number of classes is small (two in this case). Hence, the importance 
of adaptivity in the second step. This does not affect random sensing, since it is very unlikely that a 
random sensing matrix will be correlated with the non-adaptive optimal sensing matrix in the second 
step. This does not seem to affect the RIP-AB sensing, probably because it does not use the average 
covariance matrix either. 

Figure 3 shows the classification error at the first step and the corresponding reconstruction MSE at the 
second step, for larger BDs than those in Figure 2, with the same noise levels. All classification accuracies 
are now better than in Figure 2, since now the distance between the covariance matrices is larger, making 
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classification easier. AIDA-SHT still has the best classification accuracy, but the difference with IDA is 
reduced. In the second step, the reconstruction performance using AIDA-SHT in the first step deteriorates 
more with noise, and in this case IDA is better. We believe the reason for this is that the adaptive sensing 
matrix of AIDA is much better than IDA for classification, but good features for classification are not 
necessarily good for reconstruction. In addition, since at larger BD distances classification becomes easier, 
the need for a good classifier in the first step reduces, while reconstruction becomes more important. 

Due to space limitations, the results for other signal sizes and BDs are only shown in the supplementary 
material. However, and as can be seen from the figures in the supplementary material (figures S3-S40), 
the behavior is quite similar to the results just presented. 

2) Patches from Natural Images: Figure 4 shows the mean PSNR for the non-overlapping 6x6 
patches extracted from 50 natural images at the three noise levels considered. Here we do not know the 
right class for each one of the signals, hence, we only report the reconstruction accuracy in the second 
step. The best reconstruction performance in the noise-free case is obtained for AIDA-SHT, and the 
classification (first step) stops between 5 to 6 samples on average. As the noise increases, the difference 
between AIDA and IDA reduces. This follows the same behavior observed with synthetic data, where 
AIDA-SHT degrades with noise. There is a wave-like behavior on the non-adaptive protocols used, with 
the exception of random sensing, probably due to the possible coherence of the sensing matrix between 
steps, which depends on K. Results for non-overlapping patches of sizes 8x8 and 10x10 can be found 
in the supplementary material (figures S41 and S42). 

Figure 5 shows the original and reconstructed 6x6 non-overlapping patches for a selected image using 
the different two-step CS configurations considered (Table I), K = 5, and no noise added (AIDA-SHT 
uses K = 5.41 on average). Using IDA and AIDA-SHT on the first step and the optimal adaptive (MI) 
on the second step produces the best reconstruction performances, with PSNRs that are ~ 6 dbs above 
random sensing in the first step. Notice that the reported reconstructions do not use dictionary (and 
sensing) adaptation, explaining the relative bad quality of the reconstructed patches using random and 
RIP-AB sensing in the first step (compare to the results reported on Section V-A). Results with other 
selected images and patch sizes can be found in the supplementary materials (figures S43-S53). 

Note that the results using the proposed two-step framework (figures 2-4 and S3-S42) also include the 
single-step batch results for random, RIP-AB, and IDA, for the particular case when K = M. Hence, 
from these figures, we can conclude that in general the proposed two-step framework does provide better 
reconstructions than single-step (batch) random, RIP-AB, and IDA. AIDA-SHT should not be done in 
batch mode, since it is fully adaptive. 
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In summary, AIDA-SHT in the first step and the adaptive optimal (MI) sensing matrix in the second 
is the best sensing protocol for the two-step SCS proposed, provided that the noise level is not too high. 
A good alternative (less expensive computationally) to AIDA-SHT in the first step is the well-known 
non-adaptive batch IDA, provided that we know a priori the optimal number of measurements K needed 
for class detection in the first step. 

C. Classification 

In this section we test on the Statlog (Landsat Satellite) dataset, consisting of six classes with 36 
numerical attributes and 6435 instances [43]. For this dataset, we used dictionary learning and data- 
adaptation. As Figure 6a shows, only IDA achieves good classification accuracy when training the 
dictionaries (GMM with 6 classes). Hence, in order to remove the effect of dictionary adaptation, we used 
the best learned dictionary for this dataset (obtained using IDA) for all the remaining sensing matrices. 
Figure 6b shows the classification accuracy using this GMM, IDA and AIDA-SHT achieve the best 
classification performance. AIDA-SHT, automatically stops at five measurements on average. 

These classification results correspond to the classification of each CS signal independently of the 
other CS signals, and within the proposed two-step framework. Therefore, it cannot be directly compared 
with classifiers that use all the CS data at the same time. 4 These results show that classification accuracy 
closely follows the results obtained using synthetic data, but now, for real data. In addition, and if we 
are only interested in sensing a given class, we can stop sensing that signal if we determine in the first 
step that the class is of no interest, reducing thus the average number of measurements. Indeed, the total 
number of measurements for single step (adaptive or not) sensing, with S signals and M measurements, 
is SM, but, if we are only interested in a given class 7, the average number of measurements reduces 
to S(K(1 — p(j)) + Mp{^)), where 77(7) is the probability of class 7. 

Finally, the proposed AIDA or AIDA-SHT could be used for classification purposes only (single step), 
where the final classification is achieved using an offline classifier that uses all the CS data, such as 
support vector machines, neural networks, or Bayesian classifiers. 

VI. Conclusion 

We have developed a novel two-step SCS framework tailored to GMMs, from where several SCS 
protocols can be chosen (Table I). The best sensing protocol, in terms of accuracy of classification in 

4 If we take, however, all the CS signals and use a quadratic Bayesian classifier, we obtain a classification accuracy of 82% 
that is similar to the results reported in [37]. 
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the first step, reconstruction in the second step, and automatic selection of number of samples, is the 
AIDA-SHT in the first step and the optimal (MI) adaptive sensing in the second step. A good alternative 
to AIDA-SHT in the first step is a batch non-adaptive IDA or a batch AIDA with a predefined number 
of measurements, which seems more robust to noise and deviations from the GMM assumption. The 
two-step SCS is clearly superior to a batch single step SCS, the reconstruction accuracy is equal or better 
than a single step SCS and the average total number of measurements is lower, we can stop sensing a 
given signal if in the first step we determine that the signal is of no interest. 

The two-step SCS framework presented here also applies to a single step adaptive or batch sensing 
approach, for the particular case when K = M. More general GMMs than the single Gaussian per signal 
studied here, or other non-Gaussian models, can also be considered within this framework, at the cost of 
higher mathematical and possibly computational complexity. 

Appendix A 
Solution to Equation (14) 

The general orthogonal Procrustes problem is defined as [36] 



Let M = A T C, and M = UAW T its eigen-decomposition. The solution to (36) is given by X = UW . 
Now, noticing that ||A|||, = ||A T |||,, we can rewrite (14) as 



Comparing (37) and (36) one can see that M = E, and if E = UAW T , then the solution to (37) is 
B T = UW T , hence, B = WU T . 

Appendix B 
Derivation of Equation (18) 

By Bayes' theorem, we know that 



X = argmin-^- 1| AX — C||^, s. 



t. xx T = I. 



(36) 




(37) 



p(y*l0,y(fc-i)) 



p(yfc,g|y(fc-i)) 
p(g\y(k-i)) 



(38) 



Hence, (17) becomes 




(39) 
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Also by Bayes, 



, | , _ p(yfc,y(fc_i)b) _ p(y(k)\g) 
l>yi ' !, - ya '- l - > p{y( k -i)\g) p(y(fe-i)b)' 



p(y*|y(fc-i)) 



p(yfe,y(fe-i)) _ p(y(fc)) 



(40) 



(41) 



p(y(fc-i)) p(y(k-i)) 

Replacing (40) and (41) in (39), and applying logarithm properties, 

J(yfc;fl1y(fc-i)) = £{ lo s p(y(k)\g)} - E{iog p(y (k) )} 

-{E{\og p(y (fc _i)| fl )} - E{log p(y (fc _i))}). (42) 
Applying the definition of differential entropy, #(/) = — -E{log /}, we arrive to Equation (18). 



Appendix C 
Derivation of Equation (27) 

The covariance of y^)> given Gaussian class g, is given by 



J y(k)\g\ 



*(fe)S 9 *f fc) +CT 2 I 



(fc-1) 
*fc 



(fc-1) ^fc 



**1 + * 2 I. 



A:/; 



(fc-1) 



-S-(fc-l) SfcS^ + <7 2 I 6 

Since by hypothesis x is an iV-dimensional Gaussian with positive definite covariance ~E g (g unknown 
of course), then the covariance matrix is also positive definite [38], hence, invertible 

A B 
C D 



J y(j=-i)l9 



*fc S 9*(fc-i) * k V g *l + a% 



and we can use the identity for determinants 



|A||D - CA^Bj [40], [44]. Then, 



J ym\g\ 



(43) 



J y(f=-i)lsl 



where in the last step we used the fact that = ^fc^jT- 

Starting from (24), and following the same steps indicated before, it is straightforward to show that 



J y(fc) I 



J y(fc-i) l 



- S#f fc _ 1) S-];_ 1) * (fc _ 1) S + a 2 I 6 )^ 



(44) 



Replacing (43) and (44) in (25), and defining P„ and P as indicated on (27), we arrive at Equation (27). 
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Appendix D 
Derivation of Equation (34) 



By Bayes' theorem, 



p(yfc,y7 fc _i)|x 7 ) 



p( x7 ly( fe -i)) 



p(ywl x7 ) 
^(y( fc -i)l x7 ) 



Also, by Bayes, 



pCyifelyJIfc-i)) 



p(yZ,yJfc_i)) _ p(y 7 fe) ) 



p(y 7 fc _i)) p(y 7 fc -i)) 



Replacing (45) and (46) in (34), 

nyi-xw )- £ fi og p(y w |x7)p(y »- i ) ) ] 



and applying logarithm properties and the definition of differential entropy, 

'(y^ 7 |yj-i)) = *(p(yfo)) - ff(p(y? fc) l* 7 )) - WipfrJ^)) - # (p(y 7 fe _ 1} |x 7 )) 



(45) 



(46) 



(47) 



(48) 



The term within brackets in (48) is independent of so it can be considered a constant. The 
probability p{y7 k \) = p{y(k)\d = l) is the same as (19), replacing g by 7, and its entropy is given by 



[35] 



H{p(yJ k) )) = - fc6(l + log(27r)) + log|S : 



y ( *) 1 



J y« 



*W S 7*ffc)- 



(49) 



hm ex p{ - 2^ (y w - *w x7 ) T (y 7 fe ) - *c*) x7 ))> 



(50) 



On the other hand, 

p(y 7 fe)l x7 ) = (27ra2) fefe/2°^v " 2o2 VJ, w " ; 

which has entropy H(p(yJ k ^\x y )) = log(27rcr 2 ), and is independent of Hence, Equation (33) can 
be rewritten as, 

(51) 



-f(yfc;x 7 |y 7 fc _ 1} ) = -iog|s y 7 fc) | + F(yJ k ^ 1) ) 



_ 1 

'(fe-i)^ - 2 

where F{yJ k _^) accounts for all terms dependent on yj k _^ plus some constants. Hence, in order to 
solve (32), we need to maximize log|E y 7 |. Since the logarithm is a monotonic function, it is enough 
to maximize |53 y 7 |. Now, following the same steps indicated in (43), we arrive at 



'(k-i)< 



* fc (£ 7 - S 7 *f fc _ 1} E-i * (fc _ 1} S 7 + a 2 I b )3> 



(52) 



■(*-i)~y7»-D 

Let P 7 = E 7 — S 7 *^ fc __ 1 ^5]~ T 1 $( fc _ 1 )S 7 + cr 2 Ib, and since |E y 7 fc _ i) | is independent of the solution 
to (32), reduces to, 



argmax|, 



* fc P 7 *^ , s.t. * fc *^ = I. 



(53) 



This corresponds to a high dimensional extension of the Rayleigh-Ritz Theorem, whose solution is 
given by Equation (34) (see proof of this theorem in [44]). 
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Appendix E 

Synthetic Multidimensional Gaussian Distributions 

Let R be an NxN Gaussian random matrix and R = UAV T its corresponding singular value 
decomposition. We define synthetic covariance matrices as £ = UAU T , An = rl^i~ w , where r G 
(0 l],/3 G [48], a; G {3,4}. These ranges where chosen empirically, in order to obtain Bhattacharyya 

1 I S I 

distances (BDs) in the range [30 + oo], where the BD is given by BD = k ln(— J=J=). 

More specifically, we obtained around 100 pairs of covariances matrices with BDs for each one of the 
following ranges: [30 46), [46 62), [62 78), [78 94), [94 110], (110 126], (126 142], and (142 +oo), for 
a total of 800 pairs of covariance matrices. Bhattacharyya distances among patches from natural images 
are in the [30 61] range, which justifies the chosen ranges. Larger Bhattacharyya distances than those 
considered here are quite difficult to obtain at random and often lead to numerical instability. 
Acknowledgments: Work supported by ONR, NSF, DARPA, NGA, ARO, and NSSEFF. We thank very 
constructive comments and discussions with Prof. Robert Calderbank, Prof. David Brady, and Prof. 
Stanley Osher. 
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Fig. 1. Reconstructed image from learned dictionaries and non-overlapping patches of size 8x8 (CS to 12 samples), a) Original, 
b) Random (29.1 dbs), c) Unstructured/Structured (29.2 dbs), d) RIP-AB (32.1 dbs) 
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Fig. 2. Classification accuracy (Step 1) and reconstruction MSE (step 2) for synthetic signals of dimension 64 (CS to 16 
samples) and BDs G [30 46). a) No noise, b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 3. Classification accuracy (Step 1) and reconstruction MSE (step 2) for synthetic signals of dimension 64 (CS to 16 
samples) and BDs G [62 78). a) No noise, b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 5. Reconstructed image from non-overlapping patches of size 6x6 (CS to 6 samples) using the following two-step protocols: 
a) Original, b) Random - Optimum (MSE) non-adaptive (26.8 dbs), c) RIP-AB - Optimum (MSE) non-adaptive (27.66 dbs), 
d) IDA - Optimum (MSE) non-adaptive (30.51 dbs), e) IDA- Optimum (MI) adaptive (30.77 dbs), and f) AIDA-SHT-Optimum 
(MI) adaptive (33.9 dbs). 
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Computational Complexity 

We analyze here the worst case computational complexity for each one of the two-step configurations 
indicated in Table 1. Let us start by considering the simplest, most general non-adaptive batch statistical 
CS, where the dictionary (GMM) is learned and M <C N random measurements are used. The CS 
complexity is the same for all the methods considered here, since they all use M measurements per signal. 
Since CS consists of an MxN matrix by N x 1 vector multiplication, for each signal, the complexity of 
CS is O(kSMN), where S is the total number of signals and k is the number of MAP-EM iterations 
until convergence. The complexity required to estimate the original signal x (E-step) is dominated by 
matrix multiplication in the E-step (Equation (5)) for each Gaussian, which is 0(kSGMN 2 ). Now, 
the complexity of the M-step is dominated by the update of the PCA basis (Equation (2)), which is 
0(kGN 3 ), since there are G Gaussians. Hence, the overall computational complexity of a random block 
GMM SCS with dictionary (and CS matrix) learning is 0{kSGMN 2 ) + 0(kGN 3 ) + O(kSMN). Since 
S » N, the dominant time complexity is 0(kGSMN 2 ). Of course, dictionary (GMM) learning needs to 
be done only once (offline), hence, the complexity for an already learned dictionary and random sensing 
plus decoding becomes 0(GSMN 2 ). 

The simplest configuration in Table 1 uses random sensing in the first step and the non-adaptive optimal 
(MSE) sensing for the estimated Gaussian in the second step. In this case, the computational complexity 
in the first and second steps is dominated again by matrix multiplication in (5), which is 0(kSGMN 2 ). 
Since reconstruction must be done twice (one on each step), the overall increase in computational cost 
with respect to single step random sensing is a factor of two, which is marginal, and this extra cost is 
offset by the improvement in the reconstructions obtained using the two-step framework (see Section V). 
On the other hand, the deterministic RIP-AB sensing matrix needs to be computed only once, at every 
iteration of the MAP-EM algorithm, hence, its computational cost is only 0(nGN 3 ). Even more, the 
Wienner filter in Equation (5) is the same for all for all signals and has also a computational cost of 
0(kGN 3 ), hence, the overall computational of RIP-AB is 0(kG(N 3 +SMN)), accounting for the matrix 
(Wiener filter) by vector (CS signal) multiplications in Equation (5). Note that we cannot precompute 
the Wiener filter when using a random sensing matrix, since it changes for every new signal (as it is 
recommended for improved sensing results, e.g., [19]). The RIP-AB matrix in the first step and the G 
possible optimal (MSE) non-adaptive matrices in the second step, can all be computed offline, before the 
sensing begins. 

Now, the computational complexity of IDA (assuming an steepest ascend approach, see Equation (30)) 
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is given by 0{\K 3 ), corresponding to x steepest ascend iterations, where the computational complexity 
of each iteration is given by the cost of inverting KxK matrices, plus matrix multiplication of the same 
size. Since K < M, the worst case time complexity of IDA is 0(xM 3 ). Given that the sensing matrix 
needs to be computed only once every MAP-EM iteration, 1 , and the Wiener filters are the same for all 
signals, the computational complexity (including E-M steps) of IDA is given by 0(k(xM 3 + GSMN)), 
where x is in the order of 10 3 — 10 4 . 

Finally, the computational complexity of AIDA-SHT is given by 0(kSxK a ), where the computational 
cost is dominated by K/b IDA-like steepest ascent iterations, for each signal (see Algorithm 1). Since 
K < M, the worst case scenario of AIDA-SHT is given by 0{kSxM a ), for b = 1. In the second step, the 
optimal (MI) adaptive sensing matrix can be pre-computed for every possible length K of the AIDA-SHT 
sensing matrix obtained in the first step and for every Gaussian. The cost in the second step would be 
0(GN 3 ), but needs to be done only once (assuming a dictionary already learned). Hence, AIDA-SHT plus 
the optimal (MI) adaptive sensing matrix in the second step has a complexity 0(kS(xM a + GMN 2 )), 
since the Wienner filters change for each signal. Since % is usually of the order of 10 3 — 10 4 and S 
can be of the order of 10 5 for non-overlapping patches and 10 7 for overlapping patches (depending of 
course on the image size), the time complexity of AIDA-SHT in the first step imposes a significant extra 
computational cost and should be done offline if possible, with a previously learned dictionary. 

Even tough the proposed AIDA-SHT is the most expensive computationally, it has a great theoretical 
justification, potentially improving classification and reconstruction accuracies (see Section V). Further 
work is necessary to reduce the time complexity of AIDA-SHT. One possibility is to theoretically estimate 
the expected minimum number of adaptive samples K in the first step, for a given GMM and probability of 
classification error P e . Another possibility is to learn by Monte Carlo simulations or direct experimentation 
what is the number of samples required in the first step, for a given class of signals. 



'Despite the fact that IDA is usually initialized with a random sensing matrix, IDA (and AIDA) might be run several times 
offline, producing a good solution that is fixed (deterministic) for all signals. 
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Non- Adaptive Statistical Compressive Sensing 




Fig. 1. Image reconstructed from learned dictionaries and non-overlapping patches of size 8x8 (CS to 12 samples), a) Original, 
b) Random (23.2 dbs), c) Unstructured/Structured (23.3 dbs), d) RIP-AB (26.6 dbs). 
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Fig. 2. Image reconstructed from learned dictionaries and non-overlapping patches of size 8x8 (CS to 12 samples), a) Original, 
b) Random (23.9 dbs), c) Unstructured/Structured (23.9 dbs), d) RIP-AB (26.4 dbs). 
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Adaptive Statistical Compressive Sensing - Synthetic Data 
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Fig. 3. Classification accuracy (step 1) synthetic signals of dimension 36 (CS to 6 samples) BD £ [30 46). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 



January 27, 2012 



DRAFT 



7 




■Random + Opt (MSE}, non-adaptive 
-RIP-AB + Opt (MSE), non-adaptive 
■•IDA + Opt (Ml), non-adaptive 
-IDA + Opt (Ml), adaptive 
AIDA-SHT + Opt (Ml), adaptive 



k (samples) 



(a) 




Random + Opt (MSE), non-adaptive 
^—RIP-AB + Opt (MSE), non-adaptive 
^^IDA + Opt (Ml), non-adaptive 

IDA + Opt (Ml), adaptive 
• AIDA-SHT + Opt (Ml), adaptive 



k (samples) 



<b) 




k (samples) 



^^Random + Opt (MSE), non-adaptive 
^—RIP-AB + Opt (MSE), non-adaptive 
^"IDA + Opt (Ml), non-adaptive 
IDA + Opt (Ml), adaptive 
• AIDA-SHT + Opt (Ml), adaptive 



(c) 



Fig. 4. MSE (step 2) reconstructed synthetic signals of dimension 36 (CS to 6 samples) BD £ [30 46). a) No noise, b) SNR 
of 40 dbs, c) SNR of 30 dbs. 
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Fig. 5. Classification accuracy (step 1) synthetic signals of dimension 36 (CS to 6 samples) BD 6 [46 62). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 6. MSE (step 2) reconstructed synthetic signals of dimension 36 (CS to 6 samples) BD £ [46 62). a) No noise, b) SNR 
of 40 dbs, c) SNR of 30 dbs. 
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Fig. 7. Classification accuracy (step 1) synthetic signals of dimension 36 (CS to 6 samples) BD 6 [62 78). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 8. MSE (step 2) reconstructed synthetic signals of dimension 36 (CS to 6 samples) BD 6 [62 78). a) No noise, b) SNR 
of 40 dbs, c) SNR of 30 dbs. 
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Fig. 9. Classification accuracy (step 1) synthetic signals of dimension 36 (CS to 6 samples) BD £ [78 94). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 10. MSE (step 2) reconstructed synthetic signals of dimension 36 (CS to 6 samples) BD g [78 94). a) No noise, b) SNR 
of 40 dbs, c) SNR of 30 dbs. 
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Fig. 11. Classification accuracy (step 1) synthetic signals of dimension 36 (CS to 6 samples) BD £ [94 110). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 12. MSE (step 2) reconstructed synthetic signals of dimension 36 (CS to 6 samples) BD £ [94 110). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 13. Classification accuracy (step 1) synthetic signals of dimension 64 (CS to 16 samples) BD £ [46 62). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 14. MSE (step 2) reconstructed synthetic signals of dimension 64 (CS to 16 samples) BD 6 [46 62). a) No noise, b) SNR 
of 40 dbs, c) SNR of 30 dbs. 
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Fig. 15. Classification accuracy (step 1) synthetic signals of dimension 64 (CS to 16 samples) BD 6 [78 94). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 16. MSE (step 2) reconstructed synthetic signals of dimension 64 (CS to 16 samples) BD £ [78 94). a) No noise, b) SNR 
of 40 dbs, c) SNR of 30 dbs. 
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Fig. 17. Classification accuracy (step 1) synthetic signals of dimension 64 (CS to 16 samples) BD £ [94 110). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 18. MSE (step 2) reconstructed synthetic signals of dimension 64 (CS to 16 samples) BD 6 [94 110). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 19. Classification accuracy (step 1) synthetic signals of dimension 64 (CS to 16 samples) BD G [110 126). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 20. MSE (step 2) reconstructed synthetic signals of dimension 64 (CS to 16 samples) BD £ [110 126). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 21. Classification accuracy (step 1) synthetic signals of dimension 64 (CS to 16 samples) BD 6 [126 142). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 22. MSE (step 2) reconstructed synthetic signals of dimension 64 (CS to 16 samples) BD £ [126 142). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 23. Classification accuracy (step 1) synthetic signals of dimension 64 (CS to 16 samples) BD £ [142 +oo). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 



January 27, 2012 



DRAFT 



27 




^^Random + Opt (MSE), non-adaptive 

RIP-AB + Opt (MSE), non-adaptive 
^^IDA + Opt (Ml), non-adaptive 
^"IDA + Opt (Ml), adaptive 
• AIDA-SHT + Opt (Ml), adaptive 



10 11 12 13 14 15 16 

k (samples) 

(a) 




(b) 



^"Random + Opt (MSE), non-adaptive 
RIP-AB + Opt (MSE), non-adaptive 

^~IDA + Opt (Ml), non-adaptive 
IDA + Opt (Ml), adaptive 
• AIDA-SHT + Opt (Ml), adaptive 




^^"Random + Opt (MSE), non-adaptive 
RIP-AB + Opt (MSE), non-adaptive 

^^IDA + Opt (Ml), non-adaptive 
IDA + Opt (Ml), adaptive 
• AIDA-SHT + Opt (Ml), adaptive 



10 11 12 13 14 15 16 

k (samples) 

(c) 



Fig. 24. MSE (step 2) reconstructed synthetic signals of dimension 64 (CS to 16 samples) BD 6 [142 + oo). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 



January 27, 2012 



DRAFT 



28 



5.00E-01 




O.OOE+00 -I—* 1 . 1 . 1 . 1 . . 1 1 . 1 1 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

k (samples) 

(a) 



6.00E-01 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

k (samples) 

(C) 

Fig. 25. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD G [30 46). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 26. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [30 46). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 27. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD G [46 62). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 28. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [46 62). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 29. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD G [62 78). a) No noise, 
b) SNR of 40 dbs, c) SNR of 31 dbs. 
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Fig. 30. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [62 78). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 31. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD G [78 94). a) No noise, 
b) SNR of 40 dbs, c) SNR of 31 dbs. 
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Fig. 32. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [78 94). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 33. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD 6 [94 110). a) No noise, 
b) SNR of 40 dbs, c) SNR of 31 dbs. 
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Fig. 34. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [94 110). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 35. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD € [110 126). a) No noise, 
b) SNR of 40 dbs, c) SNR of 31 dbs. 
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Fig. 36. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [110 126). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 37. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD € [126 142). a) No noise, 
b) SNR of 40 dbs, c) SNR of 31 dbs. 
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Fig. 38. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [126 142). a) No noise, b) 
SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 39. Classification accuracy (step 1) synthetic signals of dimension 100 (CS to 20 samples) BD £ [142 + oo). a) No 
noise, b) SNR of 40 dbs, c) SNR of 31 dbs. 
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Fig. 40. MSE (step 2) reconstructed synthetic signals of dimension 100 (CS to 20 samples) BD 6 [142 + oo). a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 
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Fig. 41. PSNR (step 2) reconstructed natural images, non-overlapping patches of size 8x8 (CS to 16 samples), a) No noise, 
b) SNR of 40 dbs, c) SNR of 30 dbs. 



January 27, 2012 



DRAFT 



45 




January 27, 2012 



DRAFT 



46 




Fig. 43. Reconstructed image from non-overlapping patches of size 6x6 (CS to 6 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (20.6 dbs), c) RIP-AB + Optimum (MSE) non-adaptive 
(22.3 dbs), d) IDA + Optimum (MSE) non-adaptive (23.3 dbs), e) IDA + Optimum (MI) adaptive (25.2 dbs), and f) AIDA-SHT 
+ Optimum (MI) adaptive (26.2 dbs). 
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Fig. 44. Reconstructed image from non-overlapping patches of size 6x6 (CS to 6 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (21.3 dbs), c) RIP-AB + Optimum (MSE) non-adaptive 
(23.3 dbs), d) IDA + Optimum (MSE) non-adaptive (23.9 dbs), e) IDA + Optimum (MI) adaptive (25.5 dbs), and f) AIDA-SHT 
+ Optimum (MI) adaptive (26.5 dbs). 
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Fig. 45. Reconstructed image from non-overlapping patches of size 6x6 (CS to 6 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (26.5 dbs), c) RIP-AB + Optimum (MSE) non-adaptive (28 
dbs), d) IDA + Optimum (MSE) non-adaptive (29.3 dbs), e) IDA + Optimum (MI) adaptive (30.5 dbs), and f) AIDA-SHT + 
Optimum (MI) adaptive (32.0 dbs). 
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Fig. 46. Reconstructed image from non-overlapping patches of size 8x8 (CS to 16 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (22.21 dbs), c) RIP-AB + Optimum (MSE) non-adaptive 
(23.32 dbs), d) IDA + Optimum (MSE) non-adaptive (26.01 dbs), e) IDA + Optimum (MI) adaptive (27.22 dbs), and f) AIDA- 
SHT + Optimum (MI) adaptive (27.17 dbs) 
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Fig. 47. Reconstructed image from non-overlapping patches of size 8x8 (CS to 16 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (22.9 dbs), c) RIP-AB - Optimum (MSE) non-adaptive (24.3 
dbs), d) IDA + Optimum (MSE) non-adaptive (27.0 dbs), e) IDA + Optimum (MI) adaptive (27.5 dbs), and f) AIDA-SHT + 
Optimum (MI) adaptive (27.4). 
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Fig. 48. Reconstructed image from non-overlapping patches of size 8x8 (CS to 16 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (28.3 dbs), c) RIP-AB + Optimum (MSE) non-adaptive (29.6 
dbs), d) IDA + Optimum (MSE) non-adaptive (31.9 dbs), e) IDA + Optimum (MI) adaptive (32.9 dbs)), and f) AIDA-SHT + 
Optimum (MI) adaptive (32.7). 
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Fig. 49. Reconstructed image from non-overlapping patches of size 8x8 (CS to 16 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (29.6 dbs), c) RIP-AB + Optimum (MSE) non-adaptive 
(31.1 dbs), d) IDA + Optimum (MSE) non-adaptive (33.1 dbs), e) IDA + Optimum (MI) adaptive (33.7 dbs), and f) AIDA-SHT 
+ Optimum (MI) adaptive (33.4 dbs). 
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Fig. 50. Reconstructed image from non-overlapping patches of size 10x10 (CS to 20 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (20. 7dbs), c) RIP-AB + Optimum (MSE) non-adaptive 
(21.3 dbs), d) IDA + Optimum (MSE) non-adaptive (25.6 dbs), e) IDA + Optimum (MI) adaptive (26.1 dbs), and f) AIDA-SHT 
+ Optimum (MI) adaptive (25.4 dbs). 
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Fig. 51. Reconstructed image from non-overlapping patches of size 10x10 (CS to 20 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (21.9 dbs), c) RIP-AB + Optimum (MSE) non-adaptive 
(22.5 dbs), d) IDA + Optimum (MSE) non-adaptive (25.9 dbs), e) IDA + Optimum (MI) adaptive (26.2 dbs), and f) AIDA-SHT 
+ Optimum (MI) adaptive (25.7 dbs). 
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Fig. 52. Reconstructed image from non-overlapping patches of size 10x10 (CS to 20 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (26.8 dbs), c) RIP-AB + Optimum (MSE) non-adaptive (27.3 
dbs), d) IDA + Optimum (MSE) non-adaptive (31.1 dbs), e) IDA + Optimum (MI) adaptive (31.3 dbs), and f) AIDA-SHT+ 
Optimum (MI) adaptive (30.2 dbs). 
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Fig. 53. Reconstructed image from non-overlapping patches of size 10x10 (CS to 20 samples) using the following two-step 
protocols: a) Original, b) Random + Optimum (MSE) non-adaptive (28.8 dbs), c) RIP-AB + Optimum (MSE) non-adaptive (28.4 
dbs), d) IDA + Optimum (MSE) non-adaptive (31.6 dbs), e) IDA + Optimum (MI) adaptive (31.5 dbs), and f) AIDA-SHT+ 
Optimum (MI) adaptive (31.2 dbs). 
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